{
 "cells": [
  {
   "cell_type": "markdown",
   "id": "134ed46e-e9ed-4fa7-b10f-5f6aff495b13",
   "metadata": {},
   "source": [
    "# Inspecting Model Architecture\n",
    "\n",
    "## Objective\n",
    "\n",
    "The objective of this tutorial is to provide various methods to examine the QMzymeModel throughout the workflow. \n",
    "\n",
    "This workflow allows you to:\n",
    "\n",
    "- Examine QMzymeModel and QMzymeResidue in various ways.\n",
    "\n",
    "In this specific example, we are using ketosteroid isomerase (KSI) as the model system. The structure for KSI is obtained from the PDB [1OH0](https://doi.org/10.2210/pdb1OH0/pdb) and MM-minimized prior to this tutorial.\n",
    "\n",
    "## Classes used in this example\n",
    "\n",
    "- [Generate Model](https://qmzyme.readthedocs.io/en/latest/API/QMzyme.GenerateModel.html)\n",
    "- [QM_method](https://qmzyme.readthedocs.io/en/latest/API/QMzyme.CalculateModel.html#qm-treatment)\n",
    "- [SelectionSchemes](https://qmzyme.readthedocs.io/en/latest/API/QMzyme.SelectionSchemes.html#)\n",
    "- [DistanceCutoff SelectionSchemes](https://qmzyme.readthedocs.io/en/latest/API/QMzyme.SelectionSchemes.html#QMzyme.SelectionSchemes.DistanceCutoff)\n",
    "\n",
    "## Required Files\n",
    "\n",
    "To start, you will need:\n",
    "\n",
    "- A fully prepped and protonated PDB of the reference protein file with the ligand bound (if applicable)\n",
    "\n",
    "---"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "e2f5fa6b-8b88-4d44-b3e6-2e3d4690bad3",
   "metadata": {},
   "outputs": [],
   "source": [
    "import QMzyme\n",
    "from QMzyme.SelectionSchemes import DistanceCutoff\n",
    "from QMzyme.data import PDB\n",
    "import pandas as pd"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "98997308-5937-4124-9224-a6c82c002986",
   "metadata": {},
   "source": [
    "Before starting, here is the model system we will be using for this workflow."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "id": "0bfb062b-95ee-4dea-9ecf-4600ae6f6546",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "\n",
      "Charge information not present. QMzyme will try to guess region charges based on residue names consistent with AMBER naming conventions (i.e., aspartate: ASP --> Charge: -1, aspartic acid: ASH --> Charge: 0.). See QMzyme.data.residue_charges for the full set.\n",
      "QMzymeRegion cutoff_3 has an estimated charge of -2.\n",
      "\n",
      "Truncated model has been created and saved to attribute 'truncated' and stored in QMzyme.CalculateModel.calculation under key QM. This model will be used to write the calculation input.\n"
     ]
    }
   ],
   "source": [
    "model = QMzyme.GenerateModel(PDB)\n",
    "QMzyme.data.residue_charges.update({'EQU': -1})\n",
    "qm_method = QMzyme.QM_Method(\n",
    "    basis_set='6-31G*', \n",
    "    functional='wB97XD', \n",
    "    qm_input='OPT FREQ', \n",
    "    program='gaussian'\n",
    ")\n",
    "model.set_catalytic_center(selection='resid 263')\n",
    "model.set_region(selection='all', name='full_protein')\n",
    "model.set_region(selection=DistanceCutoff, cutoff=3)\n",
    "c_alpha_atoms = model.cutoff_3.get_atoms(attribute='name', value='CA')\n",
    "model.cutoff_3.set_fixed_atoms(atoms=c_alpha_atoms)\n",
    "qm_method.assign_to_region(region=model.cutoff_3)\n",
    "model.truncate()"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "1446162f-254f-413a-9582-4868ca105f2e",
   "metadata": {},
   "source": [
    "## Using Print Statements\n",
    "\n",
    "One of the simplest ways to visualize the setup of your model is by directly printing its attributes. This provides a raw, immediate look at the system."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "id": "998d29dc-27e5-4ec8-8a18-0a88d5092f81",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "[<QMzymeRegion catalytic_center contains 37 atom(s) and 1 residue(s)>, <QMzymeRegion full_protein contains 4258 atom(s) and 324 residue(s)>, <QMzymeRegion cutoff_3 contains 275 atom(s) and 19 residue(s)>, <QMzymeRegion cutoff_3_truncated contains 249 atom(s) and 19 residue(s)>]\n",
      "[<QMzymeAtom 4005: C1 of resname EQU, resid 263>, <QMzymeAtom 4006: O1 of resname EQU, resid 263>, <QMzymeAtom 4007: C2 of resname EQU, resid 263>, <QMzymeAtom 4008: C3 of resname EQU, resid 263>, <QMzymeAtom 4009: C4 of resname EQU, resid 263>, <QMzymeAtom 4010: C5 of resname EQU, resid 263>, <QMzymeAtom 4011: C6 of resname EQU, resid 263>, <QMzymeAtom 4012: C7 of resname EQU, resid 263>, <QMzymeAtom 4013: C8 of resname EQU, resid 263>, <QMzymeAtom 4014: C9 of resname EQU, resid 263>, <QMzymeAtom 4015: C10 of resname EQU, resid 263>, <QMzymeAtom 4016: C11 of resname EQU, resid 263>, <QMzymeAtom 4017: C12 of resname EQU, resid 263>, <QMzymeAtom 4018: C13 of resname EQU, resid 263>, <QMzymeAtom 4019: C14 of resname EQU, resid 263>, <QMzymeAtom 4020: C15 of resname EQU, resid 263>, <QMzymeAtom 4021: C16 of resname EQU, resid 263>, <QMzymeAtom 4022: C17 of resname EQU, resid 263>, <QMzymeAtom 4023: O2 of resname EQU, resid 263>, <QMzymeAtom 4024: C18 of resname EQU, resid 263>, <QMzymeAtom 4025: H1 of resname EQU, resid 263>, <QMzymeAtom 4026: H2 of resname EQU, resid 263>, <QMzymeAtom 4027: H3 of resname EQU, resid 263>, <QMzymeAtom 4028: H4 of resname EQU, resid 263>, <QMzymeAtom 4029: H5 of resname EQU, resid 263>, <QMzymeAtom 4030: H6 of resname EQU, resid 263>, <QMzymeAtom 4031: H7 of resname EQU, resid 263>, <QMzymeAtom 4032: H8 of resname EQU, resid 263>, <QMzymeAtom 4033: H9 of resname EQU, resid 263>, <QMzymeAtom 4034: H10 of resname EQU, resid 263>, <QMzymeAtom 4035: H11 of resname EQU, resid 263>, <QMzymeAtom 4036: H12 of resname EQU, resid 263>, <QMzymeAtom 4037: H13 of resname EQU, resid 263>, <QMzymeAtom 4038: H14 of resname EQU, resid 263>, <QMzymeAtom 4039: H15 of resname EQU, resid 263>, <QMzymeAtom 4040: H16 of resname EQU, resid 263>, <QMzymeAtom 4041: H17 of resname EQU, resid 263>]\n"
     ]
    }
   ],
   "source": [
    "# This outputs all QMzymeRegions currently registered in your QMzymeModel.\n",
    "print(model.regions)\n",
    "\n",
    "# This outputs all QMzymeAtoms currently registered in your QMzymeRegion.\n",
    "print(model.catalytic_center.atoms)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "617a7fcc-7e40-4fce-a071-d8491c2eb935",
   "metadata": {},
   "source": [
    "## Using region.summarize with Pandas Dataframe\n",
    "\n",
    "Next, we will examine the region using pandas and `summarize()` method. When using `summarize()` directly, it returns a list of attributes assigned to the specific region."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "id": "a05acaef-b6d8-4067-a0b5-631485c1e547",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "{'Resid': [np.int64(263)],\n",
       " 'Resname': ['EQU'],\n",
       " 'Charge': [-1],\n",
       " 'Removed atoms': [[]],\n",
       " 'Fixed atoms': [[]],\n",
       " 'Segids': ['A']}"
      ]
     },
     "execution_count": 4,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "model.catalytic_center.summarize()"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "19da8105-76ac-41e0-a6cc-38ec3fc11616",
   "metadata": {},
   "source": [
    "As region size gets larger, the list also gets larger. To create more approachable data sets, we can use a Python package `Pandas`, to transfer the list into a table!"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "id": "42b64c82-3107-429e-abb9-aa8e843cc620",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>Resid</th>\n",
       "      <th>Resname</th>\n",
       "      <th>Charge</th>\n",
       "      <th>Removed atoms</th>\n",
       "      <th>Fixed atoms</th>\n",
       "      <th>Segids</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>263</td>\n",
       "      <td>EQU</td>\n",
       "      <td>-1</td>\n",
       "      <td>[]</td>\n",
       "      <td>[]</td>\n",
       "      <td>A</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "   Resid Resname  Charge Removed atoms Fixed atoms Segids\n",
       "0    263     EQU      -1            []          []      A"
      ]
     },
     "execution_count": 5,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df = pd.DataFrame(model.catalytic_center.summarize())\n",
    "df"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "c11e0914-13b8-41c1-975f-af7a9651c8c1",
   "metadata": {},
   "source": [
    "This is especially useful when looking at a truncated QMzymeRegion to examine their designated attributes and conditions."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "id": "60f2b4db-fdb0-4f5d-b390-91c3487fcc64",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>Resid</th>\n",
       "      <th>Resname</th>\n",
       "      <th>Charge</th>\n",
       "      <th>Removed atoms</th>\n",
       "      <th>Fixed atoms</th>\n",
       "      <th>Segids</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>16</td>\n",
       "      <td>TYR</td>\n",
       "      <td>0</td>\n",
       "      <td>[N, H, C, O]</td>\n",
       "      <td>[CA]</td>\n",
       "      <td>QM</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>20</td>\n",
       "      <td>VAL</td>\n",
       "      <td>0</td>\n",
       "      <td>[N, H, C, O]</td>\n",
       "      <td>[CA]</td>\n",
       "      <td>QM</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>40</td>\n",
       "      <td>ASP</td>\n",
       "      <td>-1</td>\n",
       "      <td>[N, H, C, O]</td>\n",
       "      <td>[CA]</td>\n",
       "      <td>QM</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>60</td>\n",
       "      <td>GLY</td>\n",
       "      <td>0</td>\n",
       "      <td>[N, H]</td>\n",
       "      <td>[CA]</td>\n",
       "      <td>QM</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>61</td>\n",
       "      <td>LEU</td>\n",
       "      <td>0</td>\n",
       "      <td>[C, O]</td>\n",
       "      <td>[CA]</td>\n",
       "      <td>QM</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>5</th>\n",
       "      <td>66</td>\n",
       "      <td>VAL</td>\n",
       "      <td>0</td>\n",
       "      <td>[N, H, C, O]</td>\n",
       "      <td>[CA]</td>\n",
       "      <td>QM</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>6</th>\n",
       "      <td>86</td>\n",
       "      <td>PHE</td>\n",
       "      <td>0</td>\n",
       "      <td>[N, H, C, O]</td>\n",
       "      <td>[CA]</td>\n",
       "      <td>QM</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>7</th>\n",
       "      <td>88</td>\n",
       "      <td>VAL</td>\n",
       "      <td>0</td>\n",
       "      <td>[N, H, C, O]</td>\n",
       "      <td>[CA]</td>\n",
       "      <td>QM</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>8</th>\n",
       "      <td>90</td>\n",
       "      <td>MET</td>\n",
       "      <td>0</td>\n",
       "      <td>[N, H, C, O]</td>\n",
       "      <td>[CA]</td>\n",
       "      <td>QM</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>9</th>\n",
       "      <td>99</td>\n",
       "      <td>LEU</td>\n",
       "      <td>0</td>\n",
       "      <td>[N, H, C, O]</td>\n",
       "      <td>[CA]</td>\n",
       "      <td>QM</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>10</th>\n",
       "      <td>101</td>\n",
       "      <td>VAL</td>\n",
       "      <td>0</td>\n",
       "      <td>[N, H, C, O]</td>\n",
       "      <td>[CA]</td>\n",
       "      <td>QM</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>11</th>\n",
       "      <td>103</td>\n",
       "      <td>ASH</td>\n",
       "      <td>0</td>\n",
       "      <td>[N, H, C, O]</td>\n",
       "      <td>[CA]</td>\n",
       "      <td>QM</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>12</th>\n",
       "      <td>118</td>\n",
       "      <td>ALA</td>\n",
       "      <td>0</td>\n",
       "      <td>[N, H, C, O]</td>\n",
       "      <td>[CA]</td>\n",
       "      <td>QM</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>13</th>\n",
       "      <td>120</td>\n",
       "      <td>TRP</td>\n",
       "      <td>0</td>\n",
       "      <td>[N, H, C, O]</td>\n",
       "      <td>[CA]</td>\n",
       "      <td>QM</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>14</th>\n",
       "      <td>263</td>\n",
       "      <td>EQU</td>\n",
       "      <td>-1</td>\n",
       "      <td>[]</td>\n",
       "      <td>[]</td>\n",
       "      <td>QM</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>15</th>\n",
       "      <td>372</td>\n",
       "      <td>WAT</td>\n",
       "      <td>0</td>\n",
       "      <td>[]</td>\n",
       "      <td>[]</td>\n",
       "      <td>QM</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>16</th>\n",
       "      <td>373</td>\n",
       "      <td>WAT</td>\n",
       "      <td>0</td>\n",
       "      <td>[]</td>\n",
       "      <td>[]</td>\n",
       "      <td>QM</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>17</th>\n",
       "      <td>376</td>\n",
       "      <td>WAT</td>\n",
       "      <td>0</td>\n",
       "      <td>[]</td>\n",
       "      <td>[]</td>\n",
       "      <td>QM</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>18</th>\n",
       "      <td>378</td>\n",
       "      <td>WAT</td>\n",
       "      <td>0</td>\n",
       "      <td>[]</td>\n",
       "      <td>[]</td>\n",
       "      <td>QM</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "    Resid Resname  Charge Removed atoms Fixed atoms Segids\n",
       "0      16     TYR       0  [N, H, C, O]        [CA]     QM\n",
       "1      20     VAL       0  [N, H, C, O]        [CA]     QM\n",
       "2      40     ASP      -1  [N, H, C, O]        [CA]     QM\n",
       "3      60     GLY       0        [N, H]        [CA]     QM\n",
       "4      61     LEU       0        [C, O]        [CA]     QM\n",
       "5      66     VAL       0  [N, H, C, O]        [CA]     QM\n",
       "6      86     PHE       0  [N, H, C, O]        [CA]     QM\n",
       "7      88     VAL       0  [N, H, C, O]        [CA]     QM\n",
       "8      90     MET       0  [N, H, C, O]        [CA]     QM\n",
       "9      99     LEU       0  [N, H, C, O]        [CA]     QM\n",
       "10    101     VAL       0  [N, H, C, O]        [CA]     QM\n",
       "11    103     ASH       0  [N, H, C, O]        [CA]     QM\n",
       "12    118     ALA       0  [N, H, C, O]        [CA]     QM\n",
       "13    120     TRP       0  [N, H, C, O]        [CA]     QM\n",
       "14    263     EQU      -1            []          []     QM\n",
       "15    372     WAT       0            []          []     QM\n",
       "16    373     WAT       0            []          []     QM\n",
       "17    376     WAT       0            []          []     QM\n",
       "18    378     WAT       0            []          []     QM"
      ]
     },
     "execution_count": 6,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df = pd.DataFrame(model.cutoff_3_truncated.summarize())\n",
    "df"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "32303981-63ff-4523-86e3-0dda68c27ec3",
   "metadata": {},
   "source": [
    "## Using print_overview()\n",
    "\n",
    "To get a more general overview of the system, you can use `print_overview()` method to examine both QMzymeModel and QMzymeRegion. This method acts as a diagnostic report for your entire workflow."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "id": "b3222228-33b2-410e-826e-36090ac76e77",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "-----------------------------\n",
      "Model Overview: 1oh0 \n",
      "-----------------------------\n",
      "  - total atoms: 4258\n",
      "  - total residues: 324\n",
      "  - total regions: 4\n",
      "-----------------------------\n",
      "Region Overview\n",
      "-----------------------------\n",
      "Region Name: catalytic_center\n",
      "  - atoms: 37\n",
      "  - residues: 1\n",
      "  - method: None\n",
      "  - selection_scheme: resid 263\n",
      "-----------------------------\n",
      "Region Name: full_protein\n",
      "  - atoms: 4258\n",
      "  - residues: 324\n",
      "  - method: None\n",
      "  - selection_scheme: all\n",
      "-----------------------------\n",
      "Region Name: cutoff_3\n",
      "  - atoms: 275\n",
      "  - residues: 19\n",
      "  - method: {'type': 'QM', 'qm_input': '6-31G* wB97XD OPT FREQ', 'basis_set': '6-31G*', 'functional': 'wB97XD', 'qm_end': '', 'program': 'gaussian', 'freeze_atoms': [2, 23, 39, 51, 58, 77, 93, 113, 129, 146, 165, 181, 194, 204], 'mult': 1, 'charge': -2}\n",
      "  - selection_scheme: DistanceCutoff\n",
      "  - cutoff: 3\n",
      "-----------------------------\n",
      "Region Name: cutoff_3_truncated\n",
      "  - atoms: 249\n",
      "  - residues: 19\n",
      "  - method: {'type': 'QM', 'qm_input': '6-31G* wB97XD OPT FREQ', 'basis_set': '6-31G*', 'functional': 'wB97XD', 'qm_end': '', 'program': 'gaussian', 'freeze_atoms': [2, 23, 39, 51, 58, 77, 93, 113, 129, 146, 165, 181, 194, 204], 'mult': 1, 'charge': -2}\n",
      "  - selection_scheme: truncated from cutoff_3\n",
      "-----------------------------\n"
     ]
    }
   ],
   "source": [
    "model.print_overview()"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3 (ipykernel)",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.11.0"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 5
}